6 - 25.4. Using Information Theory (Part 2) [ID:30374]
50 von 163 angezeigt

Okay, so generally we have two problems in learning. One is called overfitting, which

is really that we have, that we're not describing the function we really want to describe. We're

not describing F, but only the examples we have for S. And if they're noisy, for instance,

going back to your question, then we're actually describing the noise with it, which might

be extremely complicated. So that's what we call overfitting. And underfitting is when

we can't capture the intended process that is hidden or relationships that are hidden

in the data. That's always going to bug us. And those are real world problems. In 1992,

I was at Carnegie Mellon University where they had one of the first self-driving cars,

and it was extremely impressive. It really could drive on small streets and with a neural

net after I think 500 yards or so of driving, it could actually follow the road wonderfully.

Nice little neural net room there. And they just had one problem. They wanted to drive

on the highway, but that had really two problems, which were overfitting problems. At some point,

the neural net realized that there are these very nice white lines at the edge of the road,

which meant they could only drive on the left lane of the highway, because otherwise this

car would have, which was actually an army truck, but would actually take every exit.

It forgot everything else that it had learned that there are trees and other cars and just

started following that white line. Slight problem there. Or when at some point in the

US they sometimes do, on the highway actually these white lines or even the, what's it called,

light planking. I forget. Once they stop, and sometimes they do, the car would actually,

since it had learned that these guardrails are actually the best thing to follow, it

would kind of lose all of the information it had overfitted to and just basically cry

out in panic and stop because it couldn't navigate anymore. Those are typical overfitting

things. Rather than learning how to drive, this neural network was learning how to follow

either white lines or guardrails. But it learned that totally autonomously on its own and became

better and better at following the guardrails. So what these people actually did was the

video signal that this was following, they just basically blurred the right and the left

field. This is the same thing they do with horses, right? They have these kind of shutters

so that they don't get spooked by cars overtaking them. So essentially that's what they did.

And improved the, basically made the data more noisy so that the car could concentrate

on the right thing and wouldn't overfit. Okay. So overfitting is something that's an inherent

problem whereas underfitting is usually being, you can cure by more data.

So what can we do? What can we do say in decision tree learning? What you usually want is you

want to have decision trees and then you want to go and generalize them so that they actually

become less overfitted. If you have a tree that's very deep and elaborate and so on,

it may actually make decisions that are only fitted to the particular subset of data. Remember

we have a set, we have a process or some kind of a mechanism we want to make predictions

on and we have a limited sample, the ones, the examples we've seen so far. But there's

always the future which we actually want to predict. And so it might actually be a good

idea to not overfit. And one idea we can do is when we have these decision trees, we might

actually go over them again and then generalize them so that they actually become better,

less overfitted. And the obvious idea in decision trees where

small trees are beautiful is you go through the tree again, you look at the nodes and

under some, in some situations you have the feeling I don't need to make that decision.

That's a useless decision. Throw out the terminal node, throw and kind of move up the information

we have there. And that's called decision tree pruning. So we do it on the terminal

test nodes, the nodes in our tree that only have decision leaves under it. And then we

test whether such a node is irrelevant, which means in our system it has a very low information

gain. The information gain is low. We're going to make decisions that have a very, very small

empirical basis, if you will. And then you just replace that by a leaf node. You just

count the examples and just take the node mode again.

Teil eines Kapitels:
Chapter 25. Learning from Observations

Zugänglich über

Offener Zugang

Dauer

00:26:53 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-30 16:26:30

Sprache

en-US

Explanation of generalization and overfitting. Additionally, it is explained how decision trees can be cut. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen